Improving Linpack Performance on SMP Clusters with Asynchronous MPI Programming
نویسندگان
چکیده
This study proposes asynchronous MPI, a simple and effective parallel programming model for SMP clusters, to reimplement the High PerformanceLinpack benchmark. The proposed model forces processors of an SMP node to work in different phases, thereby avoiding unneccessary communication and computation bottlenecks. As a result, we can achieve significant improvements in performance with a minimal programming effort. In comparison with a de-facto flat MPI solution, our algorithm can yield a 20.6% performance improvement for a 16-node cluster of Xeon dual-processor SMPs.
منابع مشابه
Asynchronous Parallel Programming Model for SMP Clusters
Our study proposes a novel MPI-only parallel programming model with improved performance for SMP clusters. By rescheduling tasks in a typical flat MPI solution, our model forces processors of an SMP node to work in different phases, thereby avoiding unneccessary communication and computation bottlenecks. This study achieves a significant performance improvement with a minimal programming effort...
متن کاملPerformance Impact of Process Mapping on Small-Scale SMP Clusters - A Case Study Using High Performance Linpack
Typically, a High Performance Computing (HPC) cluster loosely couples multiple Symmetric MultiProcessor (SMP) platforms into a single processing complex. Each SMP uses shared memory for its processors to communicate, whereas communication across SMPs goes through the intra-cluster interconnect. By analyzing the communication pattern of processes, it is possible to arrive at a mapping of process...
متن کاملOptimization for Hybrid MPI-OpenMP Programs on a Cluster of SMP PCs
This paper applies a Hybrid MPI-OpenMP programming model with a thread-to-thread communication method on a cluster of Dual Intel Xeon Processor SMPs connected by a Gigabit Ethernet network. The experiments include the well-known HPL and CG benchmarks. We also describe optimization techniques to get a high cache hit ratio with the given architecture. As a result, the hybrid model shows performan...
متن کاملPerformance Characteristics of Intel Architecture — based Servers
Computing clusters built from standard components using Intel® processors are becoming the fastest growing choice for high-performance computing (HPC). Twice yearly, the 500 most powerful computing systems in the world are ranked on the TOP500 Supercomputer Sites Web page. In November 2002, the ranking listed 56 entries using Intel processors; by June 2003, that number reached 119. Today, three...
متن کاملA Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, however, require a central storage for storing checkpoints. This severely limits the scalability of checkpointing. We propose a scalable replication-based MPI checkpointing facility that is based on LAM/MPI. We extend...
متن کامل